By the end of the lab, you will be able to …
Download and open code-along-02.qmd
Load the standard packages.
Install and load the summarytools package.
Operators in R are symbols directing R to perform various kinds mathematical, logical, and decision operations. A few of the key ones to know before we get started:
To test equality or inequality:
==, !=, >, >=, <, <=
To indicate “and”, “or”, and “not”:
& | !
Assigning values to various data objects: <- -> =
Functions are (most often) verbs, followed by what they will be applied to in parentheses:
Remember, you can access the variables (i.e., columns) using the $ operator, as shown using the table() function.
The variable names are case sensitive. In this dataset, all variables are lowercase.
195 respondents were coded as 2 on this variable. What does that mean?
Political polarization is high in the U.S. today and attitudes about gender and family behavior have been heavily debated.
Using the most recent survey, do more liberals than conservatives think sex before marriage is ‘not wrong at all’?
How do we find out?
Let’s familiarize ourselves with the premarsx and polviews variables.
In the console, type ?premarsx and hit enter. The Help pane will show you the question text, response options and values.
Now, do the same for polviews.
Run this code to see the frequency table for the premarsx variable. Then, add a line below to also see a table for the polviews variable.
The table command also let’s you create a table with two variables.
Use haven::as_factor to see the value labels instead of the value numbers. Then, do the same for polviews.
always wrong almost always wrong
357 122
wrong only sometimes not wrong at all
258 1378
other iap
0 1126
don't know I don't have a job
50 0
dk, na, iap no answer
0 6
not imputable refused
0 0
skipped on web uncodeable
12 0
not available in this release not available in this year
0 0
see codebook
0
extremely liberal liberal
140 421
slightly liberal moderate, middle of the road
368 1148
slightly conservative conservative
381 516
extremely conservative don't know
186 99
iap I don't have a job
0 0
dk, na, iap no answer
0 20
not imputable refused
0 0
skipped on web uncodeable
30 0
not available in this release not available in this year
0 0
see codebook
0
Let’s clean up the levels for premarsx.
gss24$premarsx <- zap_missing(gss24$premarsx)
gss24$premarsx <- as_factor(gss24$premarsx)
table(gss24$premarsx)
always wrong almost always wrong wrong only sometimes
357 122 258
not wrong at all other
1378 0
Let’s get rid of the empty levels in premarsx.
always wrong almost always wrong wrong only sometimes
357 122 258
not wrong at all
1378
For polviews, let’s combine categories to ease interpretation. This is easiest when the levels are numeric.
Let’s remind ourselves what the values correspond with each label.
[1] extremely liberal [2] liberal
140 421
[3] slightly liberal [4] moderate, middle of the road
368 1148
[5] slightly conservative [6] conservative
381 516
[7] extremely conservative [NA] don't know
186 99
[NA] iap [NA] I don't have a job
0 0
[NA] dk, na, iap [NA] no answer
0 20
[NA] not imputable [NA] refused
0 0
[NA] skipped on web [NA] uncodeable
30 0
[NA] not available in this release [NA] not available in this year
0 0
[NA] see codebook
0
gss24 <- gss24 |>
mutate(pol3cat = case_when(
polviews >= 1 & polviews <= 3 ~ "Liberal",
polviews == 4 ~ "Moderate",
polviews >= 5 & polviews <= 7 ~ "Conservative",
TRUE ~ NA_character_),
pol3cat = factor(pol3cat,
levels = c("Liberal", "Moderate", "Conservative"))
)polviews
can be written as |> or %>%
Always double check your work.
Make a frequency table. One of summarytools main purposes is to help cleaning and preparing data for further analysis. Pay attention to the missing values. Then, do the same for premarsx.
Frequencies
gss24$pol3cat
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
------------------ ------ --------- -------------- --------- --------------
Liberal 929 29.40 29.40 28.07 28.07
Moderate 1148 36.33 65.73 34.69 62.77
Conservative 1083 34.27 100.00 32.73 95.50
<NA> 149 4.50 100.00
Total 3309 100.00 100.00 100.00 100.00
Frequencies
gss24$premarsx
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
-------------------------- ------ --------- -------------- --------- --------------
always wrong 357 16.88 16.88 10.79 10.79
almost always wrong 122 5.77 22.65 3.69 14.48
wrong only sometimes 258 12.20 34.85 7.80 22.27
not wrong at all 1378 65.15 100.00 41.64 63.92
<NA> 1194 36.08 100.00
Total 3309 100.00 100.00 100.00 100.00
Using report.nas = FALSE suppresses the missing data.
The headings = FALSE parameter suppresses the heading section. Do the same for premarsx.
Based on your table, what percentage of respondents believe sex before marriage is ‘almost always wrong’?
Based on your table, what percentage of respondents believe sex before marriage is ‘always’ or ‘almost always wrong’?
The table() function gives us the frequencies.
Liberal Moderate Conservative
always wrong 32 78 229
almost always wrong 21 44 51
wrong only sometimes 58 91 100
not wrong at all 505 488 331
We want to add the column percentages…
What’s your conclusion to our initial research question?
% who think sex relations before marriage is __________, by political views
Cross-Tabulation, Column Proportions
premarsx * pol3cat
Data Frame: gss24
---------------------- --------- -------------- -------------- -------------- ---------------
pol3cat Liberal Moderate Conservative Total
premarsx
always wrong 32 ( 5.2%) 78 ( 11.1%) 229 ( 32.2%) 339 ( 16.7%)
almost always wrong 21 ( 3.4%) 44 ( 6.3%) 51 ( 7.2%) 116 ( 5.7%)
wrong only sometimes 58 ( 9.4%) 91 ( 13.0%) 100 ( 14.1%) 249 ( 12.3%)
not wrong at all 505 ( 82.0%) 488 ( 69.6%) 331 ( 46.6%) 1324 ( 65.3%)
Total 616 (100.0%) 701 (100.0%) 711 (100.0%) 2028 (100.0%)
---------------------- --------- -------------- -------------- -------------- ---------------